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CN ■ Abstract 



We consider the problem of privately answering queries defined on databases which are collections 
of points belonging to some metric space. We give simple, computationally efficient algorithms for 
answering distance queries defined over an arbitrary metric. Distance queries are specified by points 



in the metric space, and ask for the average distance from the query point to the points contained in 
the database, according to the specified metric. Our algorithms run efficiently in the database size and 
the dimension of the space, and operate in both the online query release setting, and the offline setting 
' in which they must in polynomial time generate a fixed data structure which can answer all queries of 

^"*^ . interest. This represents one of the first subclasses of linear queries for which efficient algorithms are 

known for the private query release problem, circumventing known hardness results for generic linear 
Q . queries. 

1 Introduction 

Consider an online retailer who is attempting to recommend products to customers as they arrive. The 
retailer may have a great deal of demographic information about each customer, both from cookies and from 
data obtained from tracking networks. Moreover, the retailer will also have information about what other, 
demographically similar customers have purchased in the past. If the retailer can identify which cluster of 
customers the new arrival most resembles, then it can likely provide a useful set of recommendations. Note 
that this problem reduces to computing the average distance from the new arrival to past customers in each 
demographic cluster, where the distance metric may be complex and domain specific^ 
^ ■ For legal reasons (i.e. to adhere to it's stated privacy policy), or for public relations reasons, the retailer 

may not want the recommendations given to some customer i to reveal information about any specific past 
customer j ^ i. Therefore, it would be helpful if the retailer could compute these distance queries while 
guaranteeing that these computations satisfy differential privacy. Informally, this means that the distances 
computed from each new customer to the demographic clusters should be insensitive in the data of any 
single user in the database of past customers. 

Distance queries are a subclass of linear queries, which are well studied in the differential privacy 
literature HBLR081 |PNR + 09l iDRVTOl IRR101 IHR101 . For example, the data analyst could answer k such 
queries from an ^-dimensional metric space, on a database of size n using the private multiplicative weights 
mechanism of Hardt and Rothblum IIHR101 with error that scales as 0(poly(log(A;), £)/y / n)H However, 
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1 Note that the most natural metric for this problem may not be defined by an l v norm, but may be something more combinatorial, 
like edit distance on various categorical features. 

2 All of the mechanisms for answering linear queries IBLR08I |PNR+ 09l IRR10I IDRV 1 01 IRR 1 01 IHR10I IGHRU 1 1 1 IGRU121 are 
defined over discrete domains X and have an error dependence on log \X\. In contrast, these queries are defined over continuous 
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none of these mechanisms is computationally efficient, and even for the best of these mechanisms, the 
running time per query will be exponential in i, the dimension of the space. What's more, there is strong 
evidence that there do not exist computationally efficient mechanisms that can usefully and privately answer 
more than 0(n 2 ) general linear queries llDNR + 09[ IU1112I . A major open question in differential privacy 
is to determine whether there exist interesting subclasses of linear queries for which efficient algorithms do 
exist. 

In this paper, we show that distance queries using an arbitrary metric are one such class. We give 
simple, efficient algorithms for answering exponentially many distance queries defined over any metric 
space with bounded diameter. In the online query release setting, our algorithms run in time nearly linear in 
the dimension of the space and the size of the private database per query. Our algorithms remain efficient 
even in the offline query release setting, in which the mechanism must in one shot (and with only polynomial 
running time) privately generate a synopsis which can answer all of the (possibly exponentially many) 
queries of interest. This represents one of the first high dimensional classes of linear queries which are 
known to have computationally efficient private query release mechanisms which can answer large numbers 
of queries. 

1.1 Our Techniques 

At a high level, our mechanism is based on the reduction from online learning algorithms to private query 
release mechanisms developed in a series of papers [RR10l lHR101 lGHRUlfl lGRU12ll . Specifically, we use 
the fact that an online mistake-bounded learning algorithm for learning the function F : C — > R, which 
maps queries / 6 C to their answers f(D) on the private database D generically gives the existence of a 
private query release mechanism in the interactive setting, where the running time per query is equal to the 
update time of the learning algorithm. 

We observe that when the queries are metric distance queries over some continuous £ p metric space 
X, then F : X — > R is a convex, Lipschitz-continuous function. Motivated by this observation, we give 
a simple mistake-bounded learning algorithm for learning arbitrary convex Lipschitz-continuous functions 
over the unit interval [0, 1] by approximating F by successively finer piecewise linear approximations. Our 
algorithm has a natural generalization to the ^-dimensional rectangle [0, 1]^, but unfortunately the mistake 
bound of this generalization necessarily grows exponentially with I. 

Instead, we observe that if X = [0, l] e , and is endowed with the l\ metric, then F can be decomposed 
into I 1-dimensional functions F\, . . . , Fg each defined only over the unit interval [0, 1]. Hence, for the l\ 
metric, our learning algorithm can be extended to [0, l] e with only a linear increase in the mistake bound. 
In other words, the l\ metric is an easy metric for differential privacy. In fact, for t\ distance queries, our 
algorithm achieves per-query error 0(poly(log(/c),£)/n 4 / 5 ), improving on the worst-case error guarantees 
that would be given by inefficient generic query release mechanisms like [BLR081 IHR10II . 

Finally, we show that our algorithm can be used to answer distance queries for any metric space that 
can be embedded into poly {€) -dimensional l\ space using a low sensitivity embedding. A sensitivity s 
embedding is one that maps any pair of databases that differ in only 1 element into a pair of projected 
databases that differ in only s entries. Oblivious embeddings, such as the almost-isometric embedding from 
£2 into l\ are 1 -sensitive BFLM77[|Ind061 . On the other hand, generic embeddings, such as the embedding 
from an arbitrary metric space into l\ that follows from Bourgain's theorem can have sensitivity as high as 
n IIBou85llLLR931 . 

We observe, however, that for our purposes, we do not require that the embedding preserve distances 
between pairs of database points, or between pairs of query points, but rather only between database points 

^-dimensional domains, and so it is not clear that this previous work even applies. However, metric queries are Lipschitz, and so 
these mechanisms can be run on a discrete grid with roughly n n ^ points, giving a polynomial dependence on i in the error bounds, 
but an exponential dependence on £ in the running time. 
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and query points. Therefore, we are able to prove a variant of Bourgain's theorem, which only preserves dis- 
tances between query points and database points. This gives a 1-sensitive embedding from any metric space 
into log k dimensional l\ space, with distortion log k, which works for any collection of k distance queries. 
In particular, this gives us an efficient offline algorithm for answering k distance queries defined over an arbi- 
trary bounded diameter metric that has multiplicative error 0(log k) and additive error 0(polylog(/c)/n 4 / 5 ). 
Our use of metric embeddings is novel in the context of differential privacy, and we believe that they will be 
useful tools for developing efficient algorithms in the future as we identify other privacy-friendly metrics in 
addition to t\. 



1.2 Related Work 

Differential privacy was developed in a series of papers [DN03, BDMN05, DMNS06], culminating in the 
definition by Dwork, Mcsherry, Nissim, and Smith [DMNS06 ]. It is accompanied by a vast literature which 
we do not attempt to survey. 

Dwork et al. [DMNS06] also introduced the Laplace mechanism, which together with the composition 
theorems of Dwork, Rothblum, and Vadhan MDRV101 gives an efficient, interactive method for privately 
answering nearly n 2 arbitrary low-sensitivity queries on a database of size n to non-trivial accuracy. On 
the other hand, it has been known since Blum, Ligett, and Roth [BLR08] that it is information theoreti- 
cally possible to privately answer nearly exponentially many linear queries to non-trivial accuracy, but the 
mechanism of HBLR081 is not computationally efficient. A series of papers HBLR081 |PNR + 09l IDRV101 
IRR 1 01 iHRlOl IGHRU 1 1 [ IGRU 1 21 has extended the work of HBLR081 . improving its accuracy, running time, 
and generality. The state of the art is the private multiplicative weights mechanism of Hardt and Rothblum 
[HR10]. However, even this mechanism has running time that is linear in the size of the data universe, or in 
other words exponential in the dimension of the data. Finding algorithms which can achieve error bounds 
similar to IBLR081 IHRlOl while running in time only polynomial in the size of the database and the data 
dimension has been a major open question in the differential privacy literature since at least [BLR08], who 
explicitly ask this question. 

Unfortunately, a striking recent result of Ullman HUH 121 building on the beautiful work of Dwork, Naor, 



Reingold, Rothblum, and Vadhan llDNR + 09l , shows that assuming the existence of one way functions, 
no polynomial time algorithm can answer more than 0{n 2 ) arbitrary linear queries. In other words, the 
Laplace mechanism of [DMNS06] is nearly optimal among all computationally efficient algorithms for 
privately answering queries at a comparable level of generality. This result suggests that to make progress 
on the problem of computationally efficient private query release, we must abandon the goal of designing 
mechanisms which can answer arbitrary linear queries, and instead focus on classes of queries that have 
some particular structure that we can exploit. 

Before this work, there were very few efficient algorithms for privately releasing classes of "high di- 
mensional" linear queries with worst case error guarantees. Blum, Ligett, and Roth [BLR08] gave efficient 
algorithms for two low dimensional classes of queries: constant dimensional axis aligned rectangles, and 
large margin halfspace^H Feldman et al. gave efficient algorithms for releasing Euclidean A;-medians queries 
in a constant dimensional unit ball [FFKN09 ]. Note that when we restrict our attention to Euclidean metric 
spaces, our queries correspond to 1-median queries. In contrast to [FFKN09], we can handle arbitrary met- 
rics, and our algorithms are efficient also in the dimension of the metric space. Blum and Roth [BR1 1J gave 
an efficient algorithm for releasing linear queries defined over predicates with extremely sparse truth tables, 
but such queries are very rare. Only slightly more is known for average case error. Gupta et al. MGHRU111 
gave a polynomial time algorithm for releasing the answers (to linear, but non-trivial error) to conjunctions, 



3 Note that halfspace queries are in general high dimensional, but the large-margin assumption implies that the data has intrinsic 
dimension only roughly O(logn), since the dimensionality of the data can be reduced using the Johnson-Lindenstrauss lemma 
without affecting the value of any of the halfspace predicates. 
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where the error is measured in the average case on conjunctions drawn from a product distribution. Hardt, 
Rothblum, and Servedio [HRS12| gave a polynomial time algorithm for releasing answers to parity queries, 
where the error is measured in the average case on parities drawn from a product distribution. Although it is 
known how to convert average case error to worst-case error using the private boosting technique of Dwork, 
Rothblum, and Vadhan MDRVIOI . the boosting algorithm itself is not computationally efficient when the 
class of queries is large, and so cannot be applied in this setting where we are interested in polynomial time 
algorithms. For the special case of privately releasing conjunctions in I dimensions, Thaler, Ullman, and 
Vadhan HTUV121 . building on the work of Hardt, Rothblum, and Servedio [HRS12], give an algorithm that 
runs in time 0(2^), improving on the generic bound of 0(2 e ). Finding a polynomial time algorithm for 
releasing conjunctions remains an open problem. 

Metric embeddings have proven to be a useful technique in theoretical computer science, particularly 
when designing approximation algorithms. See KndOll for a useful survey. The specific embeddings 
that we use in this paper are the nearly isometric embedding from £2 into £\ using random projections 
[FLM77, Ind06 ], and a variant of Bourgain's theorem HBou851ILLR951 . which allows the embedding of an 
arbitrary metric into l\. Our use of metric embeddings is slightly different than its typical use in approxi- 
mation algorithms. Typically, metric embeddings are used to embed some problem into a metric in which 
some optimization problem of interest is tractable. In our case, we are embedding metrics into l\, for which 
the information theoretic problem of query release is simpler, since a d dimensional i\ metric can be de- 
composed into d 1-dimensional metric spaces. On the one hand, for privacy, we have a stronger constraint 
on the type of metric embeddings we can employ: we require them to be low sensitivity embeddings, which 
map neighboring databases to databases of bounded distance (in the hamming metric). The embedding cor- 
responding to Bourgain's theorem does not satisfy this property. On the other hand, we do not require that 
the embedding preserve the distances between pairs of database points, or pairs of query points, but merely 
between query points and database points. This allows us to prove a variant of Bourgain's theorem that is 
1-sensitive. We think that metric embeddings may prove to be a useful tool in the design of efficient private 
query release algorithms, and in particular, identifying other privacy friendly metrics, and the study of other 
low sensitivity embeddings is a very interesting future direction. 

2 Preliminaries 
2.1 Model 

Let {X, d) be an arbitrary metric space. Let V £ X n be a database consists of n points in the metric space. 
For the sake of presentation, we will focus on metric spaces with diameter 1 through out the main body of 
this paper. This is simply a matter of scaling: all of our error bounds hold for arbitrary diameter spaces, with 
a lineal - dependence on the diameter. 

We will consider the problem of releasing distance queries while preserving the privacy of the elements 
in the database, where each query is a point y G X in the metric space and the answer for a given query y is 
the average distance from y to the elements in the database, i.e., Ylx&v n^( x ' ^ et ^ e ^ ^ e tne set °^ 
distance queries asked by the data analyst. We will let T>(Q) 6 W k denote the exact answer to the queries Q 
with respect to database V. We will usually use Xj's to denote data points and yfs to denote query points. 

Query Release Mechanisms We will consider two settings for query release in this paper: The first setting 
is the interactive setting, where the queries are not given upfront but instead arrive online. An interactive 
query release mechanism needs to provide an answer for each query as it arrives. The answer can depend 
on the query, the private database, and the state of the mechanism, but not on future queries. An interactive 
query release mechanism is said to be efficient if the per-query running time is polynomial in the database 
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size n and the dimension of the metric space i. 

The second setting is the non-interactive setting. A non-interactive query release mechanism takes the 
database as input and outputs an algorithm that can answer all queries without further access to the database. 
We say an offline query release mechanism is efficient if both the running time of the mechanism and the 
running time per query of the algorithm it constructs are polynomial in n and £. 

2.2 Differential Privacy 

We let 1 1 Pi — 2?2||i? denote the hamming distance between two databases T>\ and T>2- Two databases are 
adjacent if the hamming distance between them is at most 1 (i.e. they differ in a single element). We will 
write n = \V\ to denote the size of the database. We will consider the by now standard privacy solution 
concept of "differential privacy" IDMN S06I . 

Definition 1 ((e, 5) -Differential Privacy). A mechanism M is (e, (^-differentially private if for all adjacent 
databases T>\ and T>2, any set of queries Q, and for all subsets of possible answers S C M. k , we have 



If5 = 0, then we say that M is e-differentially private. 

A function / : X n — > R is said to have sensitivity A with respect to the private database if maxx> l5 x> 2 \f(T>\) — 
7(^2)1 < A, where the max is taken over all pairs of adjacent databases. 

When we talk about the privacy of interactive mechanisms, the range of the mechanism is considered to 
be the entire transcript of queries and answers communicated between the data analyst and the mechanism 
(see BDRV101 IHR101 for a more precise formalization of the model). An interactive mechanism is (e, 8)- 
differential private if the probability that the transcript falls into any chosen subset differs by at most an 
exp(e) multiplicative factor and a 5 additive factor for any two adjacent databases. 

Given a mechanism, we will measure its accuracy in terms of answering distance queries as follows. 

Definition 2 (Accuracy). A mechanism M is (a, /3)-accurate if for any database T> and any set of queries 
Q, with probability at least 1 — f3, the mechanism answers every query up to an additive error a, i.e., 



3 Releasing t\ -Distance Queries 

In this section, we consider l\ distance queries, i.e., we let X C [0, l] e and d = \\.\\\ such that the diameter 
of X (with respect to £±) is 1. We present private, computationally efficient mechanisms for releasing the 
answers to i\ distance queries in both the interactive and offline setting. These mechanisms for releasing l\ 
distances will serve as important building blocks for our results for other metrics. First, let us formally state 
our result in the interactive setting: 

Theorem 1. There is an interactive (e, 5) -differentially private mechanism for releasing answers to distance 
queries with respect to (X, ||.||i) that is (a, f3)-accurate with a satisfying 



There is also an interactive e-differentially private mechanism for releasing distance queries with respect 
to (X, ||.||i) that is (a, j3) -accurate for a satisfying 



Pr [M(Di, Q) G S] < exp(e) Pr [M(V 2 , Q) G S] + 5 . 



Pr [\\M(p, Q) - X>(C)||oo < a] > 1 - /3 . 
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The per-query running times of both mechanisms is 0(£n) per query. 

As an extension of the above theorem, we also get the following result in the offline setting. 

Theorem 2. There is a poly-time (e, 5)-differentially private offline mechanism that is (a, f3)-accurate for 
releasing distance queries with respect to (X, \\.\\i),for a satisfying 



There is a poly-time e-differentially private offline mechanism that is (a, j3)-accurate for releasing distance 
queries with respect to (X, \\.\\i),for a satisfying 



Remark 2.1. Note that the offline mechanism has no dependence on the number of queries asked, in either 
the accuracy or the running time. It in one shot produces a data structure that can be used to accurately 
answer all t\ queries. 

Proof Overview To prove TheoremQ] we will use the connection between private query release and online 
learning, which was established in [RR10, HRlOl IGRU121 IJTT21 . We will briefly review this connection 
in Section [3TTI Based on this connection, it suffices to provide an online learning algorithm that learns the 
function mapping queries to their answers with respect to the database using a small number of updates. 
Next, we will shift our viewpoint by interpreting each database as a 1-Lipschitz and convex function that 
maps the query points to real values between [0, 1]. The structure of the t\ metric allows us to reduce the 
problem to learning I different one dimensional 1-Lipschitz and convex functions, for which we propose in 
Section I3T21 an online learning algorithm that only requires 0{a~ l l 2 ) updates to achieve an additive error 
bound a. Finally, we combine these ingredients to give an interactive differentially private mechanism 
for releasing answers for l\ distance queries in Section [331 and complete the proof of Theorem [TJ Roughly 
speaking, the interactive mechanism will always maintain a hypothesis function that maps queries to answers 
and it will update the hypothesis function using the online learning algorithm whenever the hypothesis 
function makes a mistake. Finally, we show that there is an explicit set of 0(£ 2 /a) queries such that asking 
these queries to the interactive mechanism is sufficient to guarantee that the hypothesis function is accurate 
with respect to all queries. So Theorem |2] follows because the offline mechanism can first ask these queries 
to the interactive mechanism and then release the hypothesis function. 

3.1 Query Release from Iterative Database Construction 

In this section, we give a (variant) of the definition of the iterative construction framework defined in 
[GRU12], generalizing the median mechanism and the multiplicative weights mechanism [RR10, HR10]. 
Let Fq : C — > M be the function such that for each y G C, Fc(y) = T)(y): i.e. F maps queries to their 
answers evaluated on V. Note that Fc{y) is a 1/n sensitive function in the private database. The variant 
of the definition of Iterative Database Construction that we give allows the learning algorithm to also learn 
the answers to some set S of 0(l/n) sensitive functions on Fc as well (in addition to just Fc(y)). In our 
application, S will consist of queries about the derivative of Fc, where in our case, Fc will be a (one-sided) 
differentiable function. 





The total running time of this mechanism is O 
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Definition 3 ([RR10, HR101 IGRU12D . Let F c : C -)■ R fee the function such that for each y £ C, 
Fc{y) = T^{y) : i- e - F maps queries to their answers evaluated on T>. Let S = {fx, . . . , f\s\} be a 
collection of functions /, : X xD — R ffeaf are eac/j 0(l/n) sensitive in the private database T>. Given an 
error bound a > and an error tolerance c, an iterative database construction algorithm using functions S 
with respect to a class of queries C plays the following game with an adversary: 

1. The algorithm maintains a hypothesis function Ft : C — >■ R on which it can evaluate queries, which 
is initialized to be some default function Fq at step 0. 

2. In each step t > 1, the adversary (adaptively) chooses a query yt G X, at which point the algorithm 
predicts a query value Ft(yt)- If\Ft{yt)—Fc{yt)\ > ct, then we say the algorithm has made a mistake. 
At this point, the algorithm receives \ S\ values a\, . . . , a\g\ G R such that for all i: a>i G [fi(yt, D) — 
ca, fi(yt, T)) + ca}. The algorithm may update its hypothesis function using this information. 

Definition 4 (Mistake Bound). An iterative database construction algorithm has a mistake bound m : 

R_l_ i — y N+, if for any given error bound a, no adversary can (adaptively) choose a sequence of queries to 
force the algorithm to make m(a) + 1 mistakes. 

Lemma 3 (IRRlOl iHRlOi IGRU121 ). If there is an iterative database construction using functions S for 
releasing a query class C with mistake bound m(a) with respect to some error tolerance c, then there is 
an (e, 5)-differentially private mechanism in the interactive setting that is (a, f3)-accurate for answering 
queries C, for a satisfying 

ca = — 3000 J\S\m(a) log(4/fl) log(fc/0) . 
ne 

There is also an e- differentially private mechanism in the interactive setting that is (a, (3)-accurate, for a 
satisfying 

ca = —3000 151 • m(a) log(fcAS) . 
ne 

Moreover, the per-query running time of the query release mechanism is equal to (up to constant factors) 
the running time of the per-round running time of the iterative database construction algorithm. 

Representing l\ Databases as Decomposable Convex Functions Consider a database V where the uni- 
verse is the ^-dimensional unit cube X = [0, Vf endowed with the l\ metric. In this setting, the function 
mapping queries y G X to their answers takes the form: F-p(y) = ^ Ylx&v \\ x ~ 2/1 1 1 ^ which is a 1/n- 
Lipschitz convex function of y. We wish to proceed by providing an iterative database construction for l\ 
distance queries using these properties. Observe that because we are working with the l\ metric, we can 

write: F v (y) = J2i=i F v where F v\v) = n T, x ev \ x * ~ Vi\- Observe that each function F^\y) is 
1-Lipschitz and convex, and has a 1 -dimensional range [0, 1]. Therefore, to learn an approximation to Fv{y) 

up to some error a, it suffices to learn an approximation to each F^\y) to error ajl. This is the approach 
we take. 

3.2 Learning 1-Lipschitz Convex Functions 

In this section we study the problem of iteratively constructing an arbitrary continuous, 1-Lipschitz, and 
convex function G : [0, 1] h-> [0, 1] up to some additive error a\ with noisy oracle access to the function. 
Here, the oracle can return the function value G(x) and the derivative G'(x) given any x G [0, 1] up to an 
additive error of a±/4. Here, we assume the derivative G' is well defined in [0, 1]: If G is not differentiable 
at x, then we assume the derivative G'(x) is (consistently) defined to be any value between the left and right 
derivatives at x. 
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Learning a 1-Lipschitz and Convex Function 

Maintain G(x) = max^jafc • x + &&} where G [—1, 1] and ^ £R define a set of linear functions. 
Initial Step: Let a = and b = 0. 

Update Step t > 1: While the update generator returns a distinguishing point a^, we shall add the 
tangent line at x\ with respect to function g to the set of linear functions, i.e., a t = G'(x^) and b t = 
G(x* t )-G(x* t Y-x* t . 

Figure 1: An algorithm for learning a 1-Lipschitz and convex one-dimensional function by approximating it 
with a piece-wise linear function. The algorithm always predicts according to G. When it makes a mistake, 
it is given an update point x\ together with G{x* t ) and G{x* t )' . 

We will first present an algorithm that learns any one-dimension 1-Lipschitz and convex function using 
an exact oracle. Then, we will explain why this algorithm is in fact noise-tolerant. Finally, we show that this 
result naturally extends to multi-dimensional decomposable functions. 

Learning 1-D Functions with an Accurate Oracle We will consider maintaining a hypothesis piece-wise 
linear function G(x) via the algorithm given in Figured] We will analyze the number of updates needed by 
this algorithm before it has learned a piece-wise linear function G that approximates G everywhere up to 
additive error a\ . 

First, for any 1-Lipschitz (possibly non-convex) function G and any given error bound a\ > 0, the algo- 
rithm in Figure[T] will make at most \ja\ mistakes. This is because the function being 1-Lipschitz implies 
that the tangent line at each update point x\ is a good approximation (up to error ai) in the neighborhood 
\x\ — ol\,x\ + ai], and hence any pair of update points are at least ct\ away from each other. Further, it is 
easy to construct examples where this bound is tight up to a constant. 

Next, we will show that using the convexity of function G, we can improve the mistake bound to 

Lemma 4. For any 1-Lipschitz convex function G and any given error bound ot\ £ (0, 1), the algorithm in 
Figure\l\will make at most -J== updates. 

Proof. Consider any two update points xj? and x* t ,. Let us assume w.l.o.g. that t < t'. Then, by our 
assumption, the tangent line at x^ does not approximate the function value of / at x* t , up to an additive error 
of a. Therefore, we get that 

ai < G{x*,)-(G'{x* t ){x*,-xt) + G{xt)) 
= G(x* t ,)-G(x* t )-G'(x* t )(x* t ,-x* t ) 
< G'(x* t ,)(x* t ,-x* t )-G'{x* t ){x* t ,-x* t ) 

= (G'(4)-G"(x*))(x t *,-x*) , (1) 

where the second inequality is by the convexity of /. 

Next, consider a maximal set of update points in sorted order: — 1 < x\ < ■ ■ ■ < xt < L Since G is 
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convex and 1-Lipschitz, we have that — 1 < G'(x\) < • • • < G'(xt) < 1- Therefore, we get that 

2-1 > {G , (x t )-G'(x 1 ))(xt-x 1 ) 

= Zl=i(G'(x t+1 ) - G'{x t )) ZjJi(Zt + i ~ x t ) 

> (El=i V(G'(xt + i) - G'(x t )){x t+l - x t )) 2 

Here, the first inequality is by G'(x t ) G [—1, 1] and x t G [0, 1] for t = 1, . . . , T; the second inequality is a 
simple application of the Cauchy-Schwartz inequality; the last inequality is by equation £[]). So by the above 
inequality, the number of mistakes is at most T < + 1 < — □ 



Learning 1-D Functions with a Noisy Oracle Note that the domain of the function is [0, 1]. So if the 

tangent line at x' approximates the function value at x up to additive error i.e., 

G{x) - G{x') + G'{x'){x - x') < y , 

then a noisy version of the tangent line G(x') + G'(x')(x - x'), where G(x') G [G(x') - ^,G(x') + ^] 
and G'(x') G [G'(x') — 2^, G'(x') + ^r-], will approximate the value at x up to additive error a±. Hence, 
the mistake bound of the algorithm in FigureQ]for learning a 1-Lipschitz and convex function up to additive 
error a\ using a noisy oracle is no more than the mistake bound for learning the same function up to additive 
error ^ with an accurate oracle. Hence, the mistake bound is still of order 0(—, =). 

Lemma 5. For any 1-Lipschitz convex function g and any given error bound a\ G (0, 1), the algorithm in 
Figure\l\will make at most 0(-^=) updates with an ^ -noisy oracle. 



Learning Decomposable Functions Suppose we want to learn an ^-dimension decomposable convex 
function F-p = Yli=i U P to a dditive error a, where each FX' is convex and 1-Lipschitz. Then, it 
suffices to learn the 1-Lipschitz convex functions Fjy for each coordinate up to error a\ = j. So as a 
simple corollary of Lemma[5l we have the following lemma: 

Lemma 6. For any function Fd : [0, Vf — > M such that: 

1. Fx>(y) = Ei=i F$(yi) w ^ ere each F$ : [0, 1] — > [0, 1] is 1-Lipschitz and convex, and: 

2. For every y G [0, 1]^ and every i G [£]: F^\y) and (F^\y))' are 1 / n-sensitive in T> 

there is an iterative database construction algorithm for Fr> using a collection of '2£ functions S with respect 
to an error tolerance 1/(4:1) that has a mistake bound ofm(a) = 0(l 3 / 2 /a 1 / 2 ). 

Proof. Let ol\ = j. Consider the following algorithm: 

1. The algorithm maintains a hypothesis function F t = Yli=i^t by maintaining £ one-dimension 
piecewise-linear hypothesis functions F^ : [0, 1] i-> [0, 1] for each i G [£] via the one-dimension 
learning algorithm with error tolerance a\, and letting F t = Yli=i ■ 

2. If the algorithm makes a mistake on query y t G [0, 1]^, then the algorithm asks query y t to each of the 
one-dimensional learning algorithms. On any of the one-dimensional learning algorithms i on which a 
mistake is made, the algorithm queries two values: F^ (y t i) and (F^)'(yu), tolerating additive error 
up to ai/(4), and updates the hypothesis F t , i = !,...,£, accordingly using the one dimensional 
learning algorithm. Note that this leads to at most |5| = 2£ queries per update. 
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Note that whenever the above algorithm makes a mistake, at least one of the one-dimensional algorithms 
must also make a mistake (since otherwise the total error was at most £a\ = a), and therefore we can 
charge this mistake to the mistake bound of at least one of the one-dimensional learning algorithms. By 
Lemma [51 the number of times that the hypothesis function F t in each coordinate admits additive error at 
least a\ is at most 0(l/y/ai). So the above iterative database construction algorithm has mistake bound 
O(e/y/o£) = 0(£^ 2 /a^ 2 ). ' □ 

3.3 Proofs of Theorem [T] and Theorem |2] 

Proof of Theorem^ Since the l\ distance function in each coordinate has range [0, 1] and is 1-Lipschitz, we 
get that and the derivative (F^)' are 0(l/n)-sensitive. So by Lemma|6l there is an iterative database 
construction algorithm for releasing answers to l\ distance queries that uses a set 5 of 2£ 0(l/n)-sensitive 
queries with error a.\, and the algorithm has mistake bound 0(£ 3 / 2 /a 1 / 2 ). 

By plugging the parameters of the above iterative database construction algorithm to Lemma |3j we get 
that there is an (e, <5)-differentially private mechanism in the interactive setting that is (a, /3) -accurate for 
releasing distance queries with respect to metric space ([0, l] e , ||.||i), for a satisfying 




Solving the above we get that 

a = 



^log(4/<5)log(fc//3) 



>/5 log 4 / 5 (4/5) log 4 / 5 (£;//?)' 



n 



4/5 £ 4/5 



We also get that there is an e-differentially private mechanism in the interactive setting that is (a, j3)- 
accurate, for a satisfying 

' 1 I 5 / 2 \ 
T/2 log(A;//3) . 



% = o 



ne cr 



Solving the above we get that 



a = 



>/3 log 2 / 3 (4/5) log 2/3 {k/13) 



j^2/3g2/3 



The analysis of the running time per query is straightforward and hence omitted. □ 

Proof of Theorem^ Consider running the online query release mechanism with accuracy a' = a/2. To 
give an offline mechanism, we simply describe a fixed set of 1 1 a' queries that we can make to each of the 
t one-dimensional learning algorithms maintaining Fjj that guarantees that for each y G [0, 1], \F^ (y) — 

(y)\ < oijl Once we have this condition, we know that for each y £ [0, \Fj)(y) — Fx>{y)\ < a. 
The queries are simple: we just take our query set to be a grid: T = {0, a' /£, 2a' /£, 3a' /£, . . . , l}.By the 
guarantees of the 1-dimensional learning algorithm, we have that for every y £ T, \F§ (y) - F$(y)\ < 
a' /I. Moreover, by the fact that F$ is 1-Lipschitz, and for every y £ [0, 1], d(y, T) < a' /£, we have that 

for every y € [0, 1], \F^\y) - F%\y)\ < 2a' /£ = a/£, which is the condition we wanted. In total, we 
make 2£ 2 /a queries, and the theorem follows by instantiating the guarantees of the online mechanism with 

k = 2£ 2 /a. □ 
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Releasing distance queries via embedding into l\ 

Input: A set of data points V. A set of queries points Q. A 1-sensitive embedding tt from (X , d) to 

(MMHIi). 

1. Construct a proxy database V for releasing l\ distances by letting tt(x) G P' for every i£D. 

2. Use the (e, 5) -differentially private mechanism (resp., e-differentially private mechanism) for re- 
leasing t\ distances queries to answer ^ Y^ x &> ll 7r ( x ) ~~ ^(j/) || l for every y G Q and release them 
as the answers to ^ Ylxev respectively. 



Figure 2: An (e, (5) -differentially private mechanism M £i g (resp., e-differentially private mechanism M e ) for 
releasing distance queries via embedding into l\ 

4 Releasing Arbitrary Distance Queries via 1 -Sensitive Metric Embeddings 

In this section, we will discuss how to release answers to distance queries with respect to other metric spaces. 
Our approach is to reduce the problem to releasing answers to l\ distance queries via metric embeddings. 
Recall that an embedding from a metric space (X, d) to another metric space (y, d!) is a mapping tt : X \- > 
y. The usefulness of an embedding is measured by how much the embedding distorts the distance between 
any pair of points. 

Note that for the purpose of answering distance queries, the usual definition of distortion is too strong 
in the sense that the usual notion of distortion considers the worst case distortion for every pair of points 
in the metric space while we only need to preserve the distances between every data-query pair. So in this 
paper, we will consider the following weaker notion of expansion, contraction, and distortion of metric 
embeddings. 

Definition 5. Recall that (X, d) is the metric space of the distance query release problem and D and Q are 
the set of data points and the set of query points respectively. The expansion of an embedding ix from (X, d) 
to another metric space {y,d!) is 

d'{ir{x),iT(y)) 



max 



The contraction of the embedding is 



x&x,y&Q d(x,y) 
d(x,y) 



max 



x&x,y&Q d'(ir(x),ir(y)) 

The distortion of the embedding is the product of its expansion and contraction. 

In the rest of this section, we will choose the target metric (y, d!) to be the i\ metric space ([0, 1]^, ||.||i) 
and we will always scale the embedding such that the expansion is 1. 

4.1 1-Sensitive Metric Embeddings 

Suppose we are given such an embedding from (X, d) to ([0, Vf , ||.||i) with expansion 1 and contraction C. 
In some cases, the dimension £ of the target i\ space may depend on the contraction C. We will embed both 
the data points and the query points into ([0, ||.||i) and release distance queries via the the mechanism 
for i\. Concretely, consider the mechanisms M e )( j and M e given in Figure [2] 

Let us first consider the accuracy of these mechanisms. The mechanisms will lose a multiplicative factor 
due to the embedding and an additive factor due to answering the l\ queries privately. More precisely, 
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Theorem 7. If the embedding ir has expansion 1 and contraction C, and if we use the (e, 5) -differentially 
private mechanism to release answers for l\ distance queries, then with probability at least 1 — f3 the 
mechanism M tt $ answers every distance query y G Q with accuracy 

xev xev 

where a 6: s = O ( ^4757175 )• If we use the e-differentially private mechanism to release answers for l\ dis- 
tance queries, then with probability at least 1 — (3 the mechanism A e answers every query y £ Q with 
accuracy 

— ^ d(x, y) -a e < A e (V, y) < ^ d(x, y) + a e , 



C 



where a t = O 



Remark 7.1. If the embedding is nearly isometric, i.e., we can achieve contraction 1 + a for any small 
a > by embedding in to an 1(a)- dimension i\ space, then we will choose the optimal additive error 
bound such that 

,,,>(M 9/5 ' 
a e ,s = O 

and 

a e = O 




n 2/3 £ 2/3 




n 2/3 £ 2/3 



Proof. Let us prove the error bound for e-differential privacy. The proof of the error bound for (e, 5)- 
differential privacy is similar. We will view the embedding ir as from (X, d) to (tt(X), ||.||i). Since the em- 
bedding 7r has expansion 1, the image tt(X) of X has diameter 1 as well. Let M^ 1 denote the e-differentially 
private mechanism for releasing answers to the l\ distance queries. Then we have that 

M e (y,D) = M^(n(y),D') 

< £||7r(x)-7r(y)||i + d 

< ^d(x,y) + 

x£D 

The proof of the lower bound is similar, hence omitted. □ 

Next we will turn to the privacy guarantee of the mechanism. Since we are using either an (e, 5)- 
differentially private mechanism or an e-differentially private mechanism for releasing answers to the i\ 
distance queries with respect to the proxy database D' , it suffices to ensure that the embeddings of neigh- 
boring databases remain neighboring databases. In general, the embedding of some point x may be defined 
in terms of other data points y, which would violate this condition. Formally, we want our embeddings to 
be 1-sensitive: 

Definition 6. An embedding ir from (X, d) to ([0, 1] , ||.||i) is 1-sensitive if changing a data point x\ £ T> 
will only change the embedding n(xi) of Xi and will not affect the embedding Tr(xj) of other Xj G V for any 
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Theorem 8. If the embedding ir is 1-sensitive, and if we use the (e, 5)-differentially private mechanism 
(resp., e-differentially private mechanism) for releasing answers to the £\ distance queries, then the mecha- 
nism M e> s (resp., M e ) is (e, 5) -differentially private (resp., e-differentially private). 

Proof. For any two neighboring databases T>\ and P 2 > the resulting proxy databases V[ and V 2 in Figure 
[2] will either be the same or be neighboring databases since the embedding it is 1-sensitive. Since we are 
using an e-differentially private mechanism for releasing l\ distances over the proxy databases, we get that 
for any set of queries Q and for any subset S of possible answers, 



So mechanism M e is e-differentially private. The proof for (e, ^-differential privacy is similar and hence 



Remark 8.1. In principle, we can also consider s-sensitive embeddings for small s. However, we are not 
aware of any useful embeddings of this kind. So we will focus on 1-sensitive embeddings in this paper. 

Remark 8.2. If the embedding ir is independent of the set Q of queries, then the mechanisms in Figure\2\can 
be made interactive or non-interactive by using the interactive or non-interactive mechanisms respectively 
for releasing answers to the l\ distance queries. If the embedding is a function of the query set, then the 
mechanism will be non-interactive, because potentially all of the queries may be needed to construct the 
embedding of the database. 

4.2 Releasing Euclidean Distance via an Oblivious Embedding 

Let us consider releasing distance queries with respect to Euclidean distance. From the metric embedding 
literature we know that there exists an almost isometric embedding from £2 to £\. More precisely, 



Lemma 9 (E.g., HFLM771 Ilnd0610 . There is an embedding it from ([0, l} e , W-Wi) to ([0, Vf', \\.\\\) with ex- 



constructed in polynomial time by defining each coordinate as a random projection. 

Since the above embedding is based on random projections, it is 1-sensitive and independent of the set 
Q of queries. Thus we can plug this embedding into our framework in Figure [2] and the following theorem 
for releasing Euclidean distances follows from Theorem|7J Remark ITTl Theorem[U and Remark [8^21 

Theorem 10. Suppose (X, ||.||2) is a subspace of the £2 space with diameter 1. Then, there are polynomial 
time interactive and non-interactive mechanisms for releasing answers to the £2 distance queries that are 
(e, 5)-differentially private and (a € g, ^-accurate for a €t s satisfying 



There are also polynomial time interactive and non-interactive mechanisms for releasing answers to the £2 
distance queries that are e-differentially private and (a e , f3)-accurate for a e satisfying 



Pr[M e (X> x , Q) E S] = Pr[M f e i(V[,ir(Q)) G S] 

<exp(e)Pr[M^(V' 2 ,7r(Q)) E S] 
= Pr[M e (V 2 ,Q) G S] . 



omitted. 



□ 




Further, this embedding can be probabilistically 
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1- sensitive variant of Bourgain: Embedding an arbitrary metic space (X, d) into £\ 

Pre-processing: For 1 < i < log k and 1 < j < K , where K is chosen to be 512(log k + log re), choose 
a random subset SV, of the query points by picking each query point y independently with probability 

2 - (i-i). 

Embedding: Given x in the metric space (X, d), embed it into {tt^ (x)}o<i<i og k,i<j<K in the 
0(K log fc)-dimension £\ space by letting TTij(x) = K \ ogk d(x, Sij). 

Figure 3: A randomized 1-sensitive embedding of an arbitrary metric space (X, d) into an 0(log 2 A; + 
log k log n)-dimension l\ space with 0(log k) distortion 

The omitted poly-log factors depends on £, n, and j3 _1 (and <5 _1 for (e, 5) -differential privacy) in the offline 
setting. In the interactive setting, this factor also depends on log k. We remark again that in the offline 
setting, the constructed data structure can answer all £2 queries. 

We remark that Lemma [9] also holds for £ p metrics for p G (1,2) (E.g., [FLM77 ]). So the results stated 
in Theorem [TOl also apply to £ p metrices for p G (1, 2). Details are omitted. 

4.3 Releasing Distances for General Metric via Bourgain's Theorem 

In this section, we will consider releasing distance queries with respect to an arbitrary metric [X , d) by 
embedding it into an £\ metric. Bourgain's theorem (e.g., [Bo u851ILLR 950 suggests that for any m points 
in the metric space, there is an embedding into an 0(log 2 m) -dimensional £\ space with distortion 0(log m). 
Unfortunately, this embedding is not oblivious and does not have low sensitivity. However, recall that for the 
purpose of releasing distance queries, we only need to preserve the distances between all data-query pairs. 
In other words, it is okay to have the distances between data points (and likewise, between query points) 
to be highly distorted. Further, we show that for this weaker notion of embedding, there is a variation of 
Bourgain's theorem using an embedding that is oblivious to the data points, and hence has sensitivity 1. 

Concretely, we will consider the embedding given in Figure [3] The idea is to define the embedding only 
using the query points and we will show this is enough to preserve the distances from any point in the metric 
space to the query points with high probability. Formally, we will prove the following theorem. 

Theorem 11 (1-Sensitive Variant of Bourgain). In the embedding given in Figure \3\ for any data point 
x G X and any query point y G Q, with probability at least 1 — J^, we have 

— — rd(x,y) < ||tt(x) -7r(y)||i < d{x,y) . 
64 log k 

The proof of the above theorem is very similar to one of the proofs for Bourgain's original theorem. The 
expansion bound is identical. The contraction bound will only guarantee the embedded distance of two pair 
of points x G V and y G Q satisfies d'(ir(x), ir(y)) > 0(^^)(d(x, y) — d(x, Q)). We observe that the 
additive loss of 0(j^j)d(x, Q) can be avoided by using an additional 0(log k + log n) dimensions in the 
embedding. We include the proof below for completeness. 

Proof. Expansion: By triangle inequality, \TT ij (x)-TTi j (y)\ = Sij)-d{y, Sij)\ < xTHiI d ( x ' 

Summing over < i < log k and 1 < j < K we have Y^i=o Yl?j=i KijC 35 ) — n ij(,v)\ — d(x, y). 

Contraction: Let us first define some notation. Let ri and r[ denote the smallest radius such that the closed 
ball (with respect to metric (X, d), similar hereafter) B(x,r{) and B(y,r' i ) respectively contains at least 
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2 l 1 query points. Let r* = maxjrj, r.}. We will have that r* is non-decreasing in i. Let i' denote the 
largest index such that r*, + r*,_ 1 < d(x, y). Redefine r*, to be d(x, y) — r*,_ v We have r*, > ■ We 
will need to following lemmas. 

Lemma 12. For any 1 < i < i', we have Ylf=i KuC 37 ) ~~ "^ijiv)] — 32 \ og k ( r i ~ r i-i) w ^ probability at 
least 1 tt4 — r- 

n z k log k 

Proof. Suppose r* = (the other case is similar). Consider the open ball B°(x,r*) and the closed ball 
B(y, r*_i). By definition, the number of query points in B°(x, r*) is less than 2 J ~ 1 , and the number query 
points in B(y, r*_ 1 ) is at least 2 J ~ 2 . Since for each 1 < j < K , the set Sij pick each query point indepen- 
dently with probability 2'^, the probability that Sij n B°(x, r*) = is at least (1 - 2"( i - 1 )) 2 ^ 1 > \, 
while the probability that SV, n B(y, r*_ x ) ^ is at least 1 - (1 - 2~^~^) T ' 2 > 1 - e~5. In sum, with 

probability at least |(1 - e~2) > 1 we have both fl B°(x, r*) = and D S(y,r*_ 1 ) 7^ 0, which 
indicates that <%) > r* and <%) < r*_ x and therefore 

ky(x) - mj(y)\ > K \ ogk ( r *i ~ r*-i) ■ (2) 



K 



Further, by the additive form of Chernoff-Hoeffding theorem, we get that with probability at least 1 — 2 64 < 
1 — n i i k \ ogk , © holds for at least J| i's. So we conclude that with probability at least 1 — n '2 k \ ogk , 

EjLl I TTij (x) ~ *ij (y)\ > gaioifc (< - <- 1 ) " D 

Lemma 13. Y$=i kii(^) - Ty(y)l = lHgT r i*- 

Proof. It is easy to see that = because y itself is a query point and r* = ri = d(x, Q). Note that for 
every j, S±j equals the set of query points. So we always have d(x, S±j) = d{x, Q) = r* and d(y, Sij) = 0. 
Therefore, \ir\j{x) — ^ij(y)\ = K\ogk r i ^ or ^ — 3 — ^> anc * surnm i n g U P completes the proof. □ 

By Lemma[[2]and union bound, with probability at least 1 \t, we have 



n 2 k ' 

K 

1- - ,,.,1 . f.. v 



^Mx)-7r«(y)|>^^ 



for all 1 < z < i'. By Lemma [T3l we have 

A 



Summing them up we get that 



log k K 



^ ^ K„(s) - mM\ > n^k4 > ^^d(x,y) 



□ 



As a corollary of Theorem [TT1 and union bound we have 
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Corollary 14. In the embedding given in Figure\3\ with probability at least 1 — K we have that for any data 
point x G D and any query points y, 

d(x,y) < \\ir(x) - 7r(y)||i < d(x,y) . 



64 log k 



Hence, there exists an embedding of an arbitrary metric to an l\ metric with distortion 0(log A;) that is 
1-sensitive because it is oblivious to the data points. The dimension of the resulting t\ metric is 0(log 2 k + 
log k log n). We remark that the expansion guarantee may fail with some small probability, in which case 
the diameter of our embedding may be greater than 1. This would appear to require us to move to an 
(e, <5)-privacy guarantee, but it does not: when computing l\ distances between points x, y, we can instead 
compute min(l, |vr(x) — w(y)\i). In the high probability event in which the expansion guarantee of the 
embedding holds, this will be exactly equal to the true distance between the embeddings of the points x and 
y. In the small probability event in which the expansion guarantee fails, the resulting queries will remain 
1/n sensitive in the private data. So by combining Theorem |7j Theorem [U and Theorem [TT] we have the 
following theorem. 

Theorem 15. For any metric space (X, d), there is a non-interactive mechanism running in time poly(n, k) 
for releasing answers to any k distance queries with respect to (X, d) that is (e, 5)-differentially private, 
such that with high probability it answers every query y G Q with accuracy 



(X,d) 
,5 



There is also a non-interactive mechanism running in time poly(n, k)for releasing answers to any k distance 
queries with respect to (X, d) that is e- differentially private, such that with high probability it answers every 
query y G Q with accuracy 

Kiss) 6 (s^) - M '" Hv) 



<Y / d{x,y)+6 

xeD 



1 



77,2/3g2/3 



Remark 15.1. Note that in this theorem, we require a dependence on k both in the running time and in the 
accuracy bounds. This is because the embedding itself is a function of all of the queries in the query class. 
This is also what requires us to restrict attention to the non-interactive setting. 



5 Conclusions 

We have shown that distance queries defined over an arbitrary metric can be privately answered using ef- 
ficient algorithms, circumventing known hardness results for less structured classes of linear queries. Our 
techniques crucially leveraged the metric structure of the queries, through our reliance on metric embed- 
dings. Identifying other kinds of query structure that can be used to design efficient private query release 
algorithms remains one of the most important directions in differential privacy. 

Towards this goal, we make a concrete conjecture. Let X = [0, Vf be the ^-dimensional unit rectangle 
endowed with the Euclidean norm, and let S C {(ft : [0, l] e — > [0, 1]} be the collection of predicates such 
that for each (ft G S: 
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1. 4> is 1-Lipschitz: for all x, y G [0, 1]^, \4>(x) — 4>(y)\ < \ \x — y\\2 

2. <f> is convex: for all x, y G [0, 1]^ and for all t G [0, 1], 0(tx + (1 - t)y) < tcf)(x) + (1 - t)<£(y) 
For each <p G S*, define the query f^>{D) = ^ ^CxeD Then: 

Conjecture. Let C = {f^ : <p G S} denote the set of 1-Lipschitz, convex linear queries defined over the 
universe X = [0, 1]^. There is a differentially private query release mechanism operating in the interactive 
setting, that can answer any subset of k queries from C to additive error 0(poly(£, log(fc))/- v /n) with 
per-query update time poly(£, n). 

Note that distance queries are a subset of convex, Lipschitz queries. Showing efficient algorithms for 
this entire set of queries would be an important step forwards towards the agenda of understanding the 
limitations of polynomial time private query release. We remark that if we remove the Lipschitz condition 
(and consider instead the class of all convex queries), then this class includes boolean conjunctions, which 
is already a challenge problem for efficient private query release. With the Lipschitz condition, this question 
is disjoint from (and possibly easier than) the question of efficiently releasing conjunctions. 

References 

[BDMN05] Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy: the 
sulq framework. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on 
Principles of Database Systems, pages 128-138. ACM, 2005. 

Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to non-interactive 
database privacy. In Proceedings of the 40th annual ACM Symposium on Theory of Computing, 
pages 609-618. ACM, 2008. 

Jean Bourgain. On Lipschitz embedding of finite metric spaces in Hilbert space. Israel Journal 
of Mathematics, 52(l):46-52, 1985. 

Avrim Blum and Aaron Roth. Fast private data release algorithms for sparse queries. arXiv 
preprint arXiv: 11 11.6842, 2011. 

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensi- 
tivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography , 
pages 265-284. Springer, 2006. 

Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In Proceedings 
of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 
pages 202-210. ACM, 2003. 

Cynthia Dwork, Moni Naor, Omer Reingold, Guy N. Rothblum, and Salil Vadhan. On the 
complexity of differentially private data release: efficient algorithms and hardness results. In 
Proceedings of the 41st annual ACM Symposium on Theory of Computing, pages 381-390. 
ACM, 2009. 

Cynthia Dwork, Guy N. Rothblum, and Salil Vadhan. Boosting and differential privacy. In 
Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science, pages 
51-60. IEEE, 2010. 

Dan Feldman, Amos Fiat, Haim Kaplan, and Kobbi Nissim. Private coresets. In Proceedings 
of the 41st Annual ACM Symposium on Theory of Computing, pages 361-370. ACM, 2009. 



17 



[FLM77] Tadeusz Figiel, Joram Lindenstrauss, and Vitali D. Milman. The dimension of almost spherical 
sections of convex bodies. Acta Mathematica, 139(l):53-94, 1977. 



[GHRU11] Anupam Gupta, Moritz Hardt, Aaron Roth, and Jonathan Ullman. Privately releasing conjunc- 
tions and the statistical query barrier. In Proceedings of the 43rd annual ACM Symposium on 
Theory of Computing, pages 803-812. ACM, 2011. 

[GRU12] Anupam Gupta, Aaron Roth, and Jonathan Ullman. Iterative constructions and private data 
release. In Proceedings of the 9th Conference on Theory of Cryptography, pages 339-356. 
Springer, 2012. 

[HR10] Moritz Hardt and Guy N. Rothblum. A multiplicative weights mechanism for privacy- 
preserving data analysis. In Proceedings of the 51st IEEE Annual Symposium on Foundations 
of Computer Science, pages 61-70. IEEE, 2010. 

[HRS12] Moritz Hardt, Guy N. Rothblum, and Rocco A. Servedio. Private data release via learning 
thresholds. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms, 
pages 168-187. SIAM, 2012. 

[IndOl] Piotr Indyk. Algorithmic applications of low-distortion geometric embeddings. In Proceedings 
of the 42nd IEEE Symposium on Foundations of Computer Science, pages 10-33. IEEE, 2001. 

[Ind06] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream com- 
putation. Journal of the ACM, 53(3):307-323, 2006. 

[JT12] Prateek Jain and Abhradeep Thakurta. Mirror descent based database privacy. Approximation, 
Randomization, and Combinatorial Optimization: Algorithms and Techniques, pages 579-590, 
2012. 

[LLR95] Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some of its 
algorithmic applications. Combinatorica, 15(2):215-245, 1995. 

[RR10] Aaron Roth and Tim Roughgarden. Interactive privacy via the median mechanism. In Proceed- 
ings of the 42nd ACM Symposium on Theory of Computing, pages 765-774. ACM, 2010. 

[TUV12] Justin Thaler, Jonathan Ullman, and Salil Vadhan. Faster algorithms for privately releasing 
marginals. Automata, Languages, and Programming, pages 810-821, 2012. 

[U1112] Jonathan Ullman. Answering n 2+ °^ counting queries with differential privacy is hard. arXiv 
preprint arXiv: 1207.6945, 2012. 



18 



